Approximate Counting: A Detailed Analysis

نویسنده

  • Philippe Flajolet
چکیده

Approximate counting is an algorithm proposed by R. Morris which makes it possible to keep approximate counts of large numbers in small counters. The algorithm is useful for gathering statistics of a large number of events as well as for applications related to data compression (Todd et al.). We provide here a complete analysis of approximate counting which establishes good convergence properties of the algorithm and allows to quantify precisely complexity-accuracy tradeoffs. Introduction. As shown by an easy information-theoretic argument, maintaining a counter whose values may range in the interval 1 to M essentially necessitates log,M bits. This lower bound is of course achieved by a 1 standard binary counter. R. Morris [8] has proposed a probabilistic algorithm that maintains an approximate count using only about log,log,M bits. This paper is devoted to a detailed analysis of characteristic parameters of that algorithm. We provide precise estimates on the probabilities of errors, from which the soundness of the method can be assessed _and complexity-accuracy trade-offs can be quantified. The algorithm itself is useful for gathering statistics on a large number of events in a storage efficient way [SI. It was proposed for applications to data compression [9] when building an adaptive encoding scheme to represent ~~~~~random” data (see e.g. [4] for adaptive Huffman codes and [7] for arithmetic coding); there, typically a large number of counters need to be maintained to gather statistics on the data to be compressed, but high accuracy of each counter is not a critical factor in the design of almost-optimal codes. Actually Todd et al. report the overall performance of a system using approximate counting which is only 4 few percent off a reference system using exact counts. There are other cases like data base systems where probabilistic counting methods prove useful. We mention a related algorithm; called ‘‘Probabilistic Counting” that has been proposed in [3]. This algorithm makes it possible to determine the approximate value of the number of distinct elements in a file in a single pass using a few operations per element and only O(1) additional storage. Received October 1982. Revised August 1984. 114 PHILIPPE FLAJOLET The plan of the paper is as follows. We start with a simple version of the algorithm: approximate counting with base 2, which is very easy to implement on a binary computer. It appears (Theorems 1, 2) that this algorithm can maintain an approximate count up to M using about log,log,M bits, with an error that is typically of one binary order of magnitude. The analytic techniques that we use in Sections 2, 3, 4, involve manipulation of generating functions related to a discrete time birth-txocess to which the algorithm is equivalent, certain properties of the Mellin integral transform, and finally some simple identities that properly belong to the theory of integer partitions. In Sections 5, 6,.we discuss the more general version of the algorithm with an arbitrary base. The analysis shows that, using suitable corrections, one can count up to M keeping only log, log,M + 6 bits with an accuracy. of order O(2 A preliminary report on this work has been presented at the International Seminar on Modelling and Performance Evaluation Methodology (“On Approximate Counting” : Proceedings, Volume 2, pp. 205-236, Paris, January 1983). + 1. Approximate counting with a binary base. If the requirement of accuracy is dropped, a counter of value n can be replaced by another counter C containing hog,nJ which only requires storing about log, log, n bits. However since the fractional part of log, n is no longer preserved, there now arises the problem of deciding when to update the logarithmic counter in the course of successive incrementations. The idea of [8], [9] is to base this decision on probabilistic choices. Approximate counting starts with counter C initialized to 1. After n increments, we expect C to contain a good approximation to hog, nJ ; we should thus increase C by 1 after another n increments approximately. Since the exact value of n has not been kept and only C is known, the algorithm has to base its decision on the content of C alone. Approximate counting then treats the incrementation by the following procedure. L. procedure increment (C : integer) ; Let DELTA (C) be ajandorn variable w hkh takes value 1 with probability 2-‘ and value 0 wrth probabi */ i tv 1 -2-‘; C := C+DELTA (C) The interesting fact about this procedure is the following: if C, is the random variable representing the content of counter C after n applications of the increment procedure, then the expectation of 2 c n bears a simple relation to n (as we shall prove at the end of Section 2).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximate Counting via the Poisson-Laplace-Mellin Method

Approximate counting is an algorithm that provides a count of a huge number of objects within an error tolerance. The first detailed analysis of this algorithm was given by Flajolet. In this paper, we propose a new analysis via the Poisson-Laplace-Mellin approach, a method devised for analyzing shape parameters of digital search trees in a recent paper of Hwang et al. Our approach yields a diff...

متن کامل

Approximate counting with m counters: A detailed analysis

The classical algorithm approximate counting was recently modified by Cichon and Macyna: Instead of one counter, m counters are used, and the assignment of an incoming item to one of the counters is random. The parameter of interest is the sum of the values of all the counters. We analyse expectation and variance, getting explicit and asymptotic formulæ.

متن کامل

Periodic Oscillations in the Analysis of Algorithms and Their Cancellations

A large number of results in analysis of algorithms contain fluctuations. A typical result might read “The expected number of . . . for large n behaves like log2 n + constant + delta(log2 n), where delta(x) is a periodic function of period one and mean zero.” Examples include various trie parameters, approximate counting, probabilistic counting, radix exchange sort, leader election, skip lists,...

متن کامل

Counting on Quantifiers: Specific Links between Linguistic Quantifiers and Number Acquisition

Knowledge of linguistic quantifiers (like all, many or some) correlates with number acquisition. However, it is unclear whether quantifier comprehension is exclusively related to exact number skills or whether the relationship also extends to approximate number skills. To find out, we tested German-speaking children on a quantifier comprehension task, two counting tasks (‘How-many task’, ‘Give-...

متن کامل

Approximate counting with m counters: a probabilistic analysis

Motivated by a recent paper by Cichoń and Macyna [1], who introduced m counters (instead of just one) in the approximate counting scheme first analysed by Flajolet [2], we analyse the moments of the sum of the m counters, using techniques that proved to be successful already in several other contexts [11].

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • BIT

دوره 25  شماره 

صفحات  -

تاریخ انتشار 1985